Integrating plasma proteomics with genome-wide association data to identify novel drug targets for inflammatory bowel disease

Inflammatory bowel disease (IBD) is a chronic disease that includes Crohn’s disease (CD) and ulcerative colitis (UC). Although genome-wide association studies (GWASs) have identified many relevant genetic risk loci, the impact of these loci on protein abundance and their potential utility as clinical therapeutic targets remain uncertain. Therefore, this study aimed to investigate the pathogenesis of IBD and identify effective therapeutic targets through a comprehensive and integrated analysis. We systematically integrated GWAS data related to IBD, UC and CD (N = 25,305) by the study of de Lange KM with the human blood proteome (N = 7213) by the Atherosclerosis Risk in Communities (ARIC) study. Proteome-wide association study (PWAS), mendelian randomisation (MR) and Bayesian colocalisation analysis were used to identify proteins contributing to the risk of IBD. Integrative analysis revealed that genetic variations in IBD, UC and CD affected the abundance of five (ERAP2, RIPK2, TALDO1, CADM2 and RHOC), three (VSIR, HGFAC and CADM2) and two (MST1 and FLRT3) cis-regulated plasma proteins, respectively (P < 0.05). Among the proteins identified via Bayesian colocalisation analysis, CADM2 was found to be an important common protein between IBD and UC. A drug and five druggable target genes were identified from DGIdb after Bayesian colocalisation analysis. Our study's findings from genetic and proteomic approaches have identified compelling proteins that may serve as important leads for future functional studies and potential drug targets for IBD (UC and CD).

Inflammatory bowel disease (IBD) is a chronic and relapsing gastrointestinal disorder.Although the pathogenesis of IBD remains unclear, it may be driven by a genetic predisposition 1 .IBD primarily includes Crohn's disease (CD) and ulcerative colitis (UC).The clinical manifestations of CD are more heterogeneous than those of UC.In particular, CD can affect any part of the gastrointestinal tract, especially the terminal ileum or the perianal region.Unlike CD, UC is limited to the large intestine, especially the peri-appendiceal region 2,3 .Despite substantial advancement in the treatment of IBD in the past few years, the standard treatment protocols available at present have limited therapeutic efficacy, with disadvantages such as inadequate response to drugs and development of drug resistance or failure over time.Therefore, it is necessary to identify therapeutic targets to facilitate the development of new drugs.
During the past decade, large-scale genome-wide association studies (GWASs) have identified various nonoverlapping genetic risk loci to screen for candidate target genes for drug development.However, the identified variants explain only a minority of the genetic risk 4 .Developing these genetic risk loci into target genes of drugs remains a significant challenge.Katrina et al. identified 25 novel loci through a meta-analysis of GWAS data including 25,305 individuals and provided further insights into the possible mechanisms underlying known therapeutic strategies 5 .However, GWASs have rarely identified causal genes mediating the effects of variation on traits 6 .Therefore, Virginia et al. identified 39 novel susceptibility genes that influence the pathogenesis of IBD by correlating gene expression with characteristics and disease using a new statistical method for a transcriptome-wide association study (TWAS) 7 .Although previous studies have revealed genetic loci associated with the pathogenesis of IBD, their results were based on genomic or transcriptomic (messenger RNA [mRNA]) analyses instead of proteomic analysis.Proteins are the ultimate products of gene expression and may act as potential drug targets and biomarkers 8 .
PWASs have attracted substantial interest in recent years.In PWASs, signals from all variants that collectively affect a protein-coding gene are integrated, and machine learning and predictive models are used to assess their overall implication in protein function 9 .Zhang et al. combined gene and protein expression data to propose the first PWAS framework based on the plasma proteome and identified three specific genetic variants, thereby revealing potential target proteins for the treatment of disease 10 .Altogether, PWAS is a comprehensive, validated and reliable analytical approach.
In this study, we integrated high-throughput proteomic and genetic data derived from plasma samples to identify proteins associated with IBD susceptibility and potential therapeutic targets for IBD.A PWAS was performed to investigate the GWAS and protein quantitative trait locus (pQTL) data of IBD.Independent Mendelian randomisation (MR) was performed to verify the causal relationship between the identified proteins and IBD.Finally, Bayesian co-localisation analysis was performed to examine whether GWAS and pQTL data were consistent with shared causal variants and identified drug and druggable target genes through DGIdb.To the best of our knowledge, this PWAS is the first to report on proteins associated with IBD susceptibility, providing new insights into potential therapeutic approaches for the disease.

Associated plasma proteins identified in PWAS
We conducted a PWAS by integrating GWAS data of IBD, UC and CD with the data of 1,348 plasma proteomes using the FUSION pipeline.The abundance of 62, 21 and 30 cis-regulated plasma proteins was significantly associated with IBD, UC and CD, respectively.Of these proteins, 4 were common among IBD, UC and CD.In addition, 15, 17 and 6 proteins were common between IBD and UC, between IBD and CD and between UC and CD, respectively.Each of these proteins had an FDR of < 0.05 with a higher confidence level.Detailed information on plasma proteins associated with IBD, UC and CD is presented in Fig. 1, Table 1 and Supplementary Table S2.
In addition to the aforementioned causal proteins, there are potential risk proteins to consider.First, during the process of screening instrumental variables, the thresholds for IL23R, HINT1, and MFNG may have been too stringent, preventing the SNPs of these proteins from being used as instrumental variables to explore the causal relationship between IBD and proteins.The same situation occurs with IL23R, HINT1, RIPK2, and HSPA1A in CD.Fortunately, the PWAS-significant proteins for UC were all utilized as robust instrumental variables for MR analysis.Secondly, under the multiple correction thresholds, some proteins passed the threshold of P < 0.05 but did not pass the FDR < 0.05 correction.These include TNFSF15, HGFAC, HYAL1, KLB, HDGF, FCN1, C10orf54, HEBP1, ABO, and MAN2B2 in IBD, IL1R2, and PARK7 in UC, and PPIH, GKN, and SERPINF2 in CD.Additionally, some proteins do not show a significant causal relationship, but their effects on the disease are consistent with the direction of PWAS.These include IL12B, STAT3, FCGR3B, IL1RL1, LRRC32, C2, CD274, CRK, NOG, NCF1, CHRDL2, LY75 and ITLN1 in IBD; STAT3, MICB, HLA-DQA2, IL23R, FCGR3B, FCGR3A, AGER and PCOLCE2 in UC; and C2, TNFSF15, APOM, ADK, IL1RL1, TNFSF8, LRRC32, CFB and C9 in CD.

Colocalisation of plasma proteins associated with disease risk
To verify genetic colocalisation, PP was evaluated to identify shared causal variants between pQTL and IBD GWAS data for genes that met the FDR-corrected P-value threshold in the MR analysis.The results revealed a probability that the GWAS and pQTL data shared a causal variant (PPH4 ≥ 0.75).Based on the PPH4 value of ≥ 0.75, 5, 3 and 2 proteins were found to play an important role in the progression of IBD, UC and CD (Figs 2  and 4).Among the proteins identified via co-localisation analysis, CADM2 is an important shared protein between IBD and UC.Proteins with PPH3 > 0.7 are also of significant interest.This includes 14 proteins in IBD, 3 in UC, and 3 in CD.Specifically, the proteins in IBD are FCGR2A, PARK7, AIF1, MXRA8, IL1R2, NADK, LY9, PIGR, PLAU, PLCG2, ICAM5, FCGR2B, EPHB4, and AGER.For UC, the proteins are AIF1, PRKCB, and PIGR.In CD, the proteins are C7, IRF3, and TNFRSF1A.

Drug prediction analysis
As most drugs exert their therapeutic effects through targeting proteins, we finally explored whether the 10 proteins identified through the comprehensive analysis can serve as potential therapeutic targets.Prioritized 4 potential targets for drug therapy intervention, including ERAP2 and RIPK2, CADM2 and VSIR were obtained from the DGIdb, through drug-gene interactions.Through druggability explorations, we identified the inhibitor of ERAP2 Tosedostat as a effective drug of IBD.These findings are expected to promote and facilitate the development of specific drugs for IBD, UC and CD.

Discussion
In this PWAS, we systematically identified plasma proteins associated with IBD (including UC and CD) through MR and Bayesian analyses to identify potential drug targets.A total of 62, 21 and 30 causal proteins were found to be associated with the risk of IBD, UC and CD, respectively.IBD and UC shared 15 causal proteins, whereas IBD and CD shared 17 causal proteins.Among these proteins, 4 proteins (MST1, IL23R, STAT3 and HGFAC) are common causal proteins associated with the risk of IBD, UC and CD.Co-localisation analysis revealed plasma proteins with higher confidence levels (including 9 plasma proteins); among which, CADM2 may play a crucial    11 .However, given the limited relationship between GWAS-based interpretation of important genes and the pathogenesis of IBD, it is necessary to understand the genetic factors associated with IBD.TWASs link gene expression with genetic susceptibility to disease.In a previous TWAS, more than 250 candidate susceptibility genes were identified in all tissue/cell types, mostly specific to the colonic subsites (ascending, transverse and descending colon) 7 .
Although the abovementioned GWASs and TWASs have been conducted for almost a decade and the relevant research results have been gradually translated into clinical practice, the pharmacological treatment of IBD (UC and CD) remains unsatisfactory, and heterogeneity in the clinical manifestations of the disease makes it difficult to find the best treatment option suitable for all patients 12 .Plasma proteins have superior potential as diagnostic/prognostic biomarkers and therapeutic targets, as they play a critical role in human health and disease 13 .Therefore, understanding the genetic regulation of the proteome may help to identify causal mechanisms underlying complex traits.
Cell adhesion molecule (CADM), a member of the immunoglobulin superfamily and an IBD-associated plasma protein identified in this study, is a pivotal genetic locus associated with various metabolic traits such as obesity and alcohol use 14,15 .Yan et al. reported that deletion of Cadm2 in mice with obesity (Cadm2/ob) reduced adiposity and systemic glucose levels, suggesting a relationship between the expression of CADM and weight gain 16 .Furthermore, a cross-sectional study reported that approximately 15-40% of adults with IBD had obesity, and an additional 20-40% of adults with IBD were overweight 17 .Jostins et al. performed weighted gene co-expression network analysis of loci associated with IBD susceptibility and found that the most significantly enriched module contained 523 genes present in the omental adipose tissue of patients with morbid obesity 18 .Many inflammatory and pro-inflammatory factors are associated with the risk of adverse outcomes of obesity and obesity-related diseases.Specifically, free fatty acids (FFAs) can activate Toll-like receptor (TLR) 4 on the plasma membrane of macrophages, resulting in increased nuclear factor kappa B-dependent expression of proinflammatory genes, including TNF-α 19 .In addition, TLR4 present on immune cells, such as monocytes and neutrophils, and non-immune cells, such as adipocytes and endothelial cells, can activate systemic and local inflammatory responses by binding to lipopolysaccharide (LPS), resulting in the dysregulation of metabolic homeostasis and damage to the intestinal mucosal barrier 20,21 .The mucosal barrier is a critical defence against external attack, which is essential for effective relief of the symptoms of IBD and UC.Therefore, the findings of this study suggest that CADM2 deficiency increases the risk of IBD and UC.Endoplasmic reticulum aminopeptidase 2 (ERAP2), another plasma protein associated with the risk of IBD, is involved in antigen processing and presentation of endogenous peptide antigens via major histocompatibility class I (MHC-I) molecules 22 .It has been reported to be associated with IBD in a previous study 18 , which is consistent with the findings of this study.To date, only two aminopeptidases (ERAP1 and ERAP2) have been identified in the lumen of the endoplasmic reticulum (ER) 23 .ERAP trims antigenic peptide precursors and assists in the presentation of MHC-I molecules; however, peptides presented by MHC-I critically affect antigen presentation to T cells and recognition by NK cells.Therefore, ERAP further contributes to the progression of IBD by affecting innate and adaptive immunity 24,25 .Previous GWAS studies have identified multiple IBD-associated loci that are involved in the specific enrichment of multiple immune cells, including NK cells, and functional dysregulation of the immune system 18 .The role of ERAP2 in the pathogenesis of IBD warrants further investigation.In addition, inhibitors of aminopeptidases (Tosedostat) are promising drugs for the treatment of IBD 26 .

Figure 4. Genetic colocalization of IBD (A-E), UC (F-H) and CD (I-J) (A) ERAP2 (B) RIPK2 (C) TALDO1 (D) CADM2 (E) RHOC (F) HGFAC (G) VSIR (H) CADM2 (I) MST1 (J) FLRT3
. In this view, each dot is a genetic variant.The SNP with the most notable P value with IBD, UC and CD is marked, and the colors of other SNPs depends on the digit size ordering of linkage disequilibrium (r2).SNPs with missing linkage disequilibrium information are also coded dark blue.In the LocusZoom plots, -log10 (P.gwas) for links with IBD, UC and CD risk are on the x-axes, and -log10 (P.pqtl) for relationship with the protein levels on the y-axes.The significant molecular roles of other genes associated with IBD, UC and CD include autophagy in dendritic cells (DCs) and the lectin pathway of complement-MBL.DCs in the intestine play a critical role in inducing immune tolerance and maintaining homeostasis 27 .Receptor-interacting serine-threonine kinase 2 (RIPK-2) induces autophagy in DCs through nucleotide-binding oligomerisation domain 2 (NOD2), which is primarily involved in bacterial processing and generation of MHC-II antigen-specific CD4( +) T cells 28 .Therefore, RIPK-2 may act as a protective factor in IBD and alleviate its symptoms 29,30 .Transaldolase 1 (TALDO1) is a rate-limiting enzyme of the pentose phosphate pathway, and its antibodies are not present in autoimmune diseases except multiple sclerosis.Therefore, the deficiency of TALDO1 may be associated with an increased risk of critical illness 31 .Furthermore, Ras homologue family member C (RHOC) is an important plasma protein associated with IBD susceptibility.It plays an essential role in tumour cell motility and metastasis formation 32 .IBD may be associated with immune-mediated.In this study, ERAP2 and RIPK2, CADM2 and VSIR were identified as druggable target genes using DGIdb, providing novel insights into the development of drugs for treating IBD 33 .
However, this study has some limitations.First, gene expression is a highly complex process that is influenced by multiple factors such as the environment; however, proteomic analysis in this study was limited to the pQTL data of patients of European origin, which may have led to some bias in the results for non-European populations.Second, the primary data in this study were obtained from the plasma proteome of the ARIC cohort which relied strongly on imputation-based approaches for genomic data and did not involve other relevant tissue systems.So, there may be unique pQTL which may not have been captured in our study.Moreover, the effects of uncommon and rare variants and complex trans-associations that remain unknown may play a significant role in heritability and should be investigated in future studies with larger sample sizes.
In conclusion, this study revealed nine protein biomarkers that may contribute to the pathogenesis of IBD (UC and CD).The findings of this study may provide a theoretical basis for future studies on genetic and functional mechanisms, which may help to develop novel therapeutic strategies for IBD (UC and CD).

Methods pQTL data
Circulating proteins (cis-pQTL) were identified based on plasma data on 4,657 plasma proteins from 7,213 European American (EA) individuals recruited by the Atherosclerosis Risk in Communities (ARIC) 10 study.The study computed the weights of imputation models based on individuals from the Phase-3 1000 Genome Project (1000Genome) and used their genotypic data as weight model.A total of 1,348 unique proteins or protein complexes encoded by autosomal genes were identified using the probabilistic estimation of expression residual (PEER) method, and their data were subjected to rank-based inverse normal transformation and quality control.Genome-wide summary-level statistics for all single-SNP cis-pQTL analysis, irrespective of significance level, and data required to perform PWAS, are available from http:// nilan janch atter jeelab.org/ pwas.

GWAS data
The publicly available GWAS data of IBD, including UC and CD, were obtained from a recent study by de Lange KM.The study included 25,305 individuals of European ancestry (12,160 patients with IBD and 13,145 control individuals) from the UK IBD Genetics Consortium (UKIBDGC) and UK10K Consortium 5 .After quality control, the data on 296,203 variants from 4,474 patients with Crohn's disease; 4,173 patients with ulcerative colitis; 592 patients with unclassified IBD and 9,500 control individuals were eventually included for analysis.Association summary statistics are available from ftp:// ftp.sanger.ac.uk/ pub/ proje ct/ humgen/ summa ry_ stati stics/ human/ 2016-11-07/.

Quality control of GWAS data
Only the data of individuals of European ancestry were included in this study.The data were subjected to stringent quality control as follows: (1) aligning data to the hgl9 human reference genome; (2) reserving biallelic autosomal SNPs; (3) removing SNPs with duplicated or missing rs ID; (4) filtering SNPs with MAF values of < 0.01.

Proteome-wide association study
PWAS was performed using the FUSION standard pipeline by integrating proteomic and genetic data (i.e.expression of plasma proteins in pQTL data and SNPs in GWAS data).For proteins with significant heritability (i.e.heritability P-value < 0.01), ENet models, which are a type of more precise predictive model, were used to examine the effects of SNPs on protein abundance by evaluating the relationship of each gene with plasma proteins in IBD, UC and CD.After considering only heritable proteins, the strength of association between plasma protein expression and SNPs of protein cis-positions was examined to predict the effects of SNPs on protein abundance.The linkage disequilibrium (LD) was controlled using an LD reference panel 34 , thereby reducing its impact on the estimated test statistics (http:// bogdan.bioin forma tics.ucla.edu/ softw are/ twas/).
Based on the false discovery rate (FDR) threshold of < 5% and mapping window of ± 500 kb, Z-scores were evaluated to measure the genetic covariance between plasma protein expression and GWAS data 35 .Based on the total number of imputation models for significant cis-heritable plasma proteins or protein complexes, a P-value of 3.7 × 10 −5 indicated significant genetic loci.

MR analysis
MR analysis, together with a series of sensitivity analyses was performed to verify whether IBD, UC and CD PWAS-significant cis-regulated plasma proteins were associated with IBD abundance and determine candidate directional anchor plasma proteins.The MR analysis conforms to the STROBE-MR Statement 36 , mainly involving instrumental variable selection, instrumental variable assessment, MR analysis as well as sensitivity analysis.

Figure 1 .
Figure 1.PWAS of IBD (A), UC (B) and CD (C) with the plasma proteomes (N = 1348) and GWAS were integrated in Manhattan plot using FUSION Each point in the plot indicates a single association test between a plasma protein and IBD, UC and CD as the -log10 (P) of a z-score test result which ordered by genomic position on the x axis and the association strength on the y axis.62, 21 and 30 proteins were identified whose cisregulated plasma protein abundance correlated with IBD, UC and CD, respectively, and the top 10 proteins with the highest correlation are illustrated in the figure.The red horizontal line represents the significant threshold for Bonferroni correction of the FDR P < 0.05 which was set at the highest unadjusted P value that is below that in IBD, UC and CD, seperately.

Figure 2 .
Figure 2. Association of protein expression in the blood with IBD (A), UC (B) and CD (C) risk The forest map for estimates of the relationship between genetically predicted protein levels and IBD, UC and CD.

Table 1 .
Candidate top 10 plasma proteins identified by PWAS analysis for IBD, UC and CD.PWAS, proteome-wide association study; ID: ID for the pQTL strongly associated with IBD/UC/CD; PWAS p-value: p-value for the PWAS association.